Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata freezing forever after -glm- command*

    Hi,

    I am running a simple glm model with -link(log)- function, and it makes both Stata 14 and 15 irresponsive in both Windows and Mac.

    Code:
    glm ff2 i.Age i.Education i.Occ i.Residency , family(binomial) link(log) eform nolog
    This doesn't happen if I change the link function to logit. Does anyone have any idea how this can be solved.


    Thanks in advance for your insights.

  • #2
    I suspect Stata does not freeze,but that glm has trouble finding a solution and continues to search for a long time. You can see that is the case by removing the nolog option. Why that might be the case depends on your data. Since you haven't said anything about that, there is nothing I can say.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      I’m a little surprised Stata even allows a log link with a binomial family. The mean function is incompatible with the quasi-likelihood. For lots of outcomes on x and parameter values, the exponential mean function can exceed one. Whenever that happens the log likelihood is ill defined. How is ff2 measured? You need to change the link or the family.

      Comment


      • #4
        It's hard to see the log link could be good for binomial responses. What's the logic there?

        Comment


        • #5
          Originally posted by Maarten Buis View Post
          I suspect Stata does not freeze,but that glm has trouble finding a solution and continues to search for a long time. You can see that is the case by removing the nolog option. Why that might be the case depends on your data. Since you haven't said anything about that, there is nothing I can say.
          Thanks Marteen. I just just tried the command without -nolog- and it shows the long line iterations, it finally ended in an error -convergence not achieved- . So perhaps there is something wrong with my equation.

          Comment


          • #6
            Originally posted by Nick Cox View Post
            It's hard to see the log link could be good for binomial responses. What's the logic there?
            It can give you results in terms of risk ratios instead of odds ratios (logit link function). I am not saying that this is a particularly good reason, but it is a reason sometimes used for the log-link function when dealing with binary dependent variables. Personally I prefer odds ratios.
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              Originally posted by Jeff Wooldridge View Post
              I’m a little surprised Stata even allows a log link with a binomial family. The mean function is incompatible with the quasi-likelihood. For lots of outcomes on x and parameter values, the exponential mean function can exceed one. Whenever that happens the log likelihood is ill defined. How is ff2 measured? You need to change the link or the family.
              Hi Jef,
              I'm actually not a statistician, just run some basic analyses for my own work, so I do not know the background math to understand which log link suits best for which family. I tried it as my intention was to calculate risk ratio.
              ff2 is a binary variable which stood for food frequency (adequate, inadequate).

              Comment


              • #8
                Originally posted by Nick Cox View Post
                It's hard to see the log link could be good for binomial responses. What's the logic there?
                Hi Nick, actually I don't have a concrete reason behind using the log function, except the fact that its the only way I was getting RRs, which is my objective in this analysis. I generally go for ORs but was trying to use a new flavour.

                Comment


                • #9
                  As Jeff said this is quite typical when you get predictions larger than 1. In the example below I estimate this model with a Poisson family rather than a binomial family, and show that it results in predictions larger than 1:

                  Code:
                  . sysuse nlsw88, clear
                  (NLSW, 1988 extract)
                  
                  . gen urban = smsa + c_city
                  
                  . label define urban 2 "central city" ///
                  >                    1 "sub-urban" ///
                  >                    0 "rural"
                  
                  . label value urban urban
                  
                  .
                  . poisson married i.race i.urban age hours i.south , irr vce(robust) base
                  
                  Iteration 0:   log pseudolikelihood = -2044.7245  
                  Iteration 1:   log pseudolikelihood = -2044.7245  
                  
                  Poisson regression                              Number of obs     =      2,242
                                                                  Wald chi2(7)      =     169.40
                                                                  Prob > chi2       =     0.0000
                  Log pseudolikelihood = -2044.7245               Pseudo R2         =     0.0160
                  
                  -------------------------------------------------------------------------------
                                |               Robust
                        married |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
                  --------------+----------------------------------------------------------------
                           race |
                         white  |          1  (base)
                         black  |   .6873876   .0334256    -7.71   0.000     .6248997    .7561241
                         other  |   1.016493   .1373108     0.12   0.904     .7800484    1.324607
                                |
                          urban |
                         rural  |          1  (base)
                     sub-urban  |   1.024406   .0346499     0.71   0.476     .9586958     1.09462
                  central city  |   .8324088   .0372839    -4.10   0.000     .7624493    .9087874
                                |
                            age |   .9908103   .0049088    -1.86   0.062     .9812359    1.000478
                          hours |   .9911047    .001207    -7.34   0.000     .9887418    .9934732
                                |
                          south |
                             0  |          1  (base)
                             1  |    1.11576   .0357806     3.42   0.001      1.04779     1.18814
                                |
                          _cons |   1.380332   .2750365     1.62   0.106     .9340675    2.039805
                  -------------------------------------------------------------------------------
                  Note: _cons estimates baseline incidence rate.
                  
                  . predict pr
                  (option n assumed; predicted number of events)
                  (4 missing values generated)
                  
                  . sum pr, detail
                  
                                   Predicted number of events
                  -------------------------------------------------------------
                        Percentiles      Smallest
                   1%     .3714456        .292085
                   5%     .3999181       .3078054
                  10%     .4421122       .3527893       Obs               2,242
                  25%     .5400519        .352894       Sum of Wgt.       2,242
                  
                  50%     .6612654                      Mean           .6427297
                                          Largest       Std. Dev.      .1363438
                  75%     .7356569       1.033322
                  90%     .8012611       1.056671       Variance       .0185896
                  95%     .8550637       1.121848       Skewness      -.0713335
                  99%     .9529721       1.132253       Kurtosis       2.656633
                  If I now try to estimate the GLM with a binomial family, it won't converge:

                  Code:
                  . glm married i.race i.urban age hours i.south, link(log) family(binomial)
                  
                  Iteration 0:   log likelihood = -2210.3622  
                  Iteration 1:   log likelihood = -1990.8396  
                  Iteration 2:   log likelihood = -1986.4671  
                  Iteration 3:   log likelihood = -1986.4624  
                  Iteration 4:   log likelihood = -1986.4492  
                  Iteration 5:   log likelihood = -1986.4417  
                  Iteration 6:   log likelihood = -1986.4363  
                  Iteration 7:   log likelihood = -1986.4313  
                  Iteration 8:   log likelihood = -1986.4206  
                  Iteration 9:   log likelihood = -1986.4107  
                  Iteration 10:  log likelihood = -1986.4088  
                  .
                  .
                  .
                  Iteration 1537: log likelihood = -1978.3333  
                  Iteration 1538: log likelihood = -1978.3331  
                  Iteration 1539: log likelihood =  -1978.333  
                  Iteration 1540: log likelihood = -1978.3327  
                  Iteration 1541: log likelihood = -1978.3325  
                  Iteration 1542: log likelihood = -1978.3324  
                  Iteration 1543: log likelihood = -1978.3322  
                  Iteration 1544: log likelihood =  -1978.332  
                  Iteration 1545: log likelihood = -1978.3318  
                  --Break--
                  r(1);
                  So in short, you should not do this.
                  ---------------------------------
                  Maarten L. Buis
                  University of Konstanz
                  Department of history and sociology
                  box 40
                  78457 Konstanz
                  Germany
                  http://www.maartenbuis.nl
                  ---------------------------------

                  Comment


                  • #10
                    Originally posted by Maarten Buis View Post
                    As Jeff said this is quite typical when you get predictions larger than 1. In the example below I estimate this model with a Poisson family rather than a binomial family, and show that it results in predictions larger than 1
                    But his predictors are
                    Code:
                    i.Age i.Education i.Occ i.Residency
                    Would that be likely in his case?

                    Or does it require fully interacted categorical predictors in order to assure that the predictions lie within the parameter space?

                    Comment


                    • #11
                      Originally posted by Joseph Coveney View Post
                      Or does it require fully interacted categorical predictors in order to assure that the predictions lie within the parameter space?
                      A fully interacted / fully saturated model is guaranteed to remain between zero and one, as it just exactly reproduces the conditional means. Without those interactions you could get predictions larger than 1, even if you only include categorical variables.

                      ---------------------------------
                      Maarten L. Buis
                      University of Konstanz
                      Department of history and sociology
                      box 40
                      78457 Konstanz
                      Germany
                      http://www.maartenbuis.nl
                      ---------------------------------

                      Comment


                      • #12
                        Originally posted by Sonnen Blume View Post
                        Does anyone have any idea how this can be solved.
                        Based on #11, try interacting your predictors. And then -margins- or -nlcom- afterward to get at the main effects. (Not sure how well this works for nonlinear or generalized linear models, but it's worth a try.)

                        Comment


                        • #13
                          Originally posted by Maarten Buis View Post

                          A fully interacted / fully saturated model is guaranteed to remain between zero and one, as it just exactly reproduces the conditional means. Without those interactions you could get predictions larger than 1, even if you only include categorical variables.
                          Thanks a lot Marteen for the interpretation with examples.

                          About the predicted values, I remember seeing regression plots containing predicted values beyond 0 and 1 in some R tutorials. So may be this condition is relaxable. The funny thing is that the glm command with log link works for upto 3 IVs. (I find it unfair, but a robot is a robot after all)

                          Comment


                          • #14
                            Originally posted by Joseph Coveney View Post
                            Based on #11, try interacting your predictors. And then -margins- or -nlcom- afterward to get at the main effects. (Not sure how well this works for nonlinear or generalized linear models, but it's worth a try.)
                            Thanks Joseph. I'll surely give it a try.

                            Comment


                            • #15
                              Despite what I wrote, you don't need to use -nlcom-, by the way. It's a generalized linear model, and so you should be get at the marginal (main effects) using regular -lincom-.

                              Comment

                              Working...
                              X